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Abstract 

Exponential- family random graph models (ERGMs) provide a prin- 
cipled and flexible way to model and simulate features common in 
social networks, such as propensities for homophily, mutuality, and 
friend-of-a-friend triad closure, through choice of model terms (suf- 
ficient statistics). However, those ERGMs modeling the more com- 
plex features have, to date, been limited to binary data: presence 
or absence of ties. Thus, analysis of valued networks, such as those 
where counts, measurements, or ranks are observed, has necessitated 
dichotomizing them, losing information and introducing biases. 

In this work, we generalize ERGMs to valued networks. Focus- 
ing on modeling counts, we formulate an ERGM for networks whose 
ties are counts and discuss issues that arise when moving beyond the 
binary case. We introduce model terms that generalize and model 
common social network features for such data and apply these meth- 
ods to a network dataset whose values are counts of interactions. 
Keywords: p-star model; transitivity; weighted network; count data; 
maximum likelihood estimation; Conway-Maxwell-Poisson distribu- 
tion 



1 Introduction 



Networks are used to represent and analyze phenomena ranging from sex- 
ual partnerships (Morris and Kretzschmar, 1997), to advice giving in an 



office (Lazega and Pattison, 1999), to friendship relations (Goodreau, Kitts 



and Morris, 2008b Newcomb, 1961), to international relations (Ward and 



Hoff , 2007), to scientific collaboration, and many other domains (Goldenberg 
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Zheng, Fienberg, and Airoldi, 2009). More often than not, the relations of 



interest are not strictly dichotomous in the sense that all present relations are 
effectively equal to each other. For example, in sexual partnership networks, 
some ties are short-term while others are long-term or marital; friendships 
and acquaintance have degrees of strength, as do international relations; and 
while a particular individual seeking advice might seek it from some cowork- 
ers but not others, he or she will likely do it in some specific order and weight 
advice of some more than others. 

Network data with valued relations come in many forms. Observing mes- 
sages (Freeman and Freeman, 1980 Diesner and Carley, 2005), instances of 



personal interaction (Bernard, Killworth, and Sailer 



co-occurrences or common features of social actors (Zachary 1977 Batagelj 



1979-1980), or counting 



and Mrvar, 2006) produce relations in the form of counts. Measurements, 



such as duration of interaction (Wyatt, Choudhury, and Bilmes 2009) or 
volume of trade (Westveld and Hoff, 2011) produce relations in the form 
of (effectively) continuous values. Observations of states of alliance and 
war (Read, 1954) produce signed relationships. Sociometric surveys often 



produce ranks in addition to binary measures of affection (Sampson 



Newcomb 1961 



Jones, and Udry 



Exponential- 



1968 



Bernard et al. , 1979-1980 Harris, Florey, Tabor, Bearman, 



2003). 



amily random graph models (ERGMs) are generative mod- 



els for networks which postulate an exponential family over the space of net- 



works of interest (Holland and Leinhardt 1981 Frank and Strauss 



specified by their sufficient statistics (Morris, Handcock, and Hunter 



1986), 



2008), 



or, as with Frank and Strauss ( 1986 ), by their conditional independence struc- 
ture leading to sufficient statistics (Besag, 1974). These sufficient statistics 
typically embody the features of the network of interest that are believed to 
be significant to the social process which had produced it, such as degree 
distribution (e.g., propensity towards monogamy in sexual partnership net- 
works), homophily (i.e., "birds of a feather flock together"), and triad-closure 
bias (i.e., "a friend of a friend is a friend") . (Morris et al. , 2008) 

A major limitation of ERGMs to date has been that they have been 
applied almost exclusively to binary relations: a relationship between a given 
actor i and a given actor j is either present or absent. This is a serious 
limitation: valued network data have to be dichotomized for ERGM analysis, 
an approach which loses information and may introduce biases. (Thomas and 



Blitzstein, 2011) 



Some extensions of ERGMs to speciflc forms of valued ties have been 
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formulated: to networks with polytomous tie values, represented as a con- 
strained three-way binary array by Robins, Pattison, and Wasserman (1999) 



and more directly by Wyatt et al. (2009; 2010); to multiple binary net- 



works by Pattison and Wasserman (1999); and the authors are also aware of 
some preliminary work by Handcock (2006) on ERGMs for signed network 
data. Rinaldo, Fienberg, and Zhou (2009) discussed binary ERGMs as a 



special case and a motivating application of their developments in geometry 
of discrete exponential families. 

A broad exception to this limitation has been a subfamily of ERGMs that 
have the property that the ties and their values are stochastically independent 
given the model parameters. Unlike the dependent case, the likelihoods for 
these models have can often be expressed as generalized linear or nonlinear 
models, and they tend to have tractable normalizing constants, which allows 
them to more easily be embedded in a hierarchical framework. Thus, to 
represent common properties of social networks, such as actor heterogeneity, 
triad-closure bias, and clustering, latent class and position models have been 
used and extended to valued networks. (Hoff 2005; Krivitsky, Handcock, 



Raftery, and Hoff, 2009 Mariadassou, Robin, and Vacher, 2010) 



In this work, we generalize the ERGM framework to directly model val- 
ued networks, particularly networks with count dyad values, while retaining 
much of the flexibility and interpretability of binary ERGMs, including the 
above-described property in the case when tie values are independent un- 
der the model. In Section |2] we review conventional ERGMs and describe 
their traits that valued ERGMs should inherit. In Section |3} we describe 
the framework that extends the model class to networks with counts as dyad 
values and discuss additional considerations that emerge when each dyad's 
sample space is no longer binary. In Section |4] we give some details and 
caveats of our implementation of these models and briefly address the issue 
of ERGM degeneracy as it pertains to count data. Applying ERGMs re- 
quiers one to specify and interpret sufficient statistics that embody network 
features of interest, all the while avoiding undesirable phenomena such as 
ERGM degeneracy. Thus, in Section |5| we introduce and discuss statistics 
to represent a variety of features commonly found in social networks, as well 
as features specific to networks of counts. In Section |6] we use these statistics 
to model social forces that affect the structure of a network of counts of so- 
cial contexts of interactions among members of a divided karate club and a 
network of counts of conversations among members of a fraternity. Finally, 
in Section [7| we discuss generalizing ERGMs to other types of valued data. 
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2 ERGMs for binary data 



In this section, we define notation, review the (potentially curved) exponential- 
family random graph model and identify those of its properties that we wish 
to retain when generalizing. 



2.1 Notation and binary ERGM definition 

Let N be the set of actors in the network of interest, assumed known and 
fixed for the purposes of this paper, and let n = |A^| be its cardinality, or the 
number of actors in the network. For the purposes of this paper, let a dyad be 
defined as a (usually distinct) pair of actors, ordered if the network of interest 
is directed, unordered if not, between whom a relation of interest may exist, 
and let Y be the set of all dyads. More concretely, if the network of interest is 
directed, Y C N x N , and if it is not, Y C {{i,j} : e N x N}. In many 
problems, a relation of interest cannot exist between an actor and itself (e.g., 
a friendship network), or actors are partitioned into classes with relations 
only existing between classes (e.g., bipartite networks of actors attending 
events), in which case Y is a proper subset of N x N, excluding those pairs 
between which there can be no relation of interest. 
Further, let the set of possible networks of interest (the sample space 
of the model) 3^ C 2^, the power set of the dyads in the network. Then 
a network y Ey, can be considered a set of ties Again, in some 

problems, there may be additional constraints on 3^. A common example of 



such constraints are degree constraints induced by the survey format (Harris 



et al., 2003; Goodreau et al. 2008b) 



Using notation similar to that of 


Hunter and Handcock 


(2006 


) and 


Kriv- 


itsky, Handcock, and Morris 


( 


2011 


, define an exponential family random 



graph model to have the form 



exp(r7(6>) ■g{y;x)) 



for random network variable Y and its realization y ; model parameter vector 
G (for parameter space C M^) and its mapping to canonical parame- 
ters 77 : — )■ W; a vector of sufficient statistics g : y ^ MP, which may also 
depend on data x, assumed fixed and known; and a normalizing constant (in 
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y) K^ g : — i- M which ensures that ([T]) sum to 1 and thus has the value 



K. 



r,,g{e; x) exp {r]{e) ■ g{y'; x)) . 

y'ey 



Here, we have given the most general case defined by Hunter and Handcock 



(2006): a frequently used special case is q = p and ri{6) = 6, so the ex- 
ponential family is linear. For notational simplicity, we will omit x for the 
remainder of this paper, as g incorporates it implicitly. 



2.2 Properties of binary ERGM 

2.2.1 Conditional distributions and change statistics 



Snijders, Pattison, Robins, and Handcock 


(2006 


), 


Hunter, Handcock, Butts, 


Goodreau, and Morris 


(2008b 


), 


Krivitsky et al. 


(2011 


), and others define 



change statistics, which emerge when considering the probability of a single 
dyad having a tie given the rest of the network and provide a convenient 
local interpretation of ERGMs. To summarize, define the p-vector of change 
statistics 

^t,j9{y) = g{y + {hj)) -g{y - ihj)), 

where y + {i,j) is the network y with edge or arc {i,j) added if absent (and 
unchanged if present) and y — {i,j) is the network y with edge or arc {i,j) 
removed if present (and unchanged if absent). Then, through cancellations. 



Pr, 



1\Y - (2, j) = y- = logit-^ (riie) ■ A,,My)) ■ 



It is often the case that the form of A.ijg{y) is simpler than that of g{y) both 
algebraically and computationally. For example, the change statistic for edge 
count \y \ is simply 1, indicating that a unit increase in T]^y^{0) will increase the 
conditional log-odds of a tie by 1, while the change statistic for the number 
of triangles in a network is | y^ 2/j 1 5 the number of neighbors i and j have in 
common, suggesting that, a positive coefficient on this statistic will increase 
the odds of a tie between i and j exponentially in the number of common 



neighbors. Hunter et al. (2008b) and Krivitsky et al. (2011) offer a further 



discussion of change statistics and their uses, and Snijders et al. (2006) and 



Schweinberger (2011) use them to diagnose degeneracy in ERGMs. It would 
be desirable for a generalization of ERGM to valued networks to facilitate 
similar local interpretation. 
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Furthermore, the conditional distribution serves as the basis for maximum 
pseudo-hkehhood estimation (MPLE) for these models. (Strauss and Ikeda 
T990| 



2.2.2 Relationship to logistic regression 

If the model has the property of dyadic independence discussed in the Intro- 
duction, or, equivalently, the change statistic Aijg{y) is constant in y (but 
may vary for different {i,j)), the model trivially reduces to logistic regres- 



sion. In that case, the MLE and the MPLE are equivalent. (Strauss and 



Ikeda, 1990) Similarly, it may be a desirable trait for valued generalizations 



of ERGMs to also reduce to GLM for dyad-independent choices of sufficient 
statistics. 



3 ERGM for counts 

We now define ERGMs for count data and discuss the issues that arise in 
the transition. 



3.1 Model definition 

Define A^, n, and Y as above. Let Nn be the set of natural numbers and 0. 



Here, we focus on counts with no a priori upper bound — or counts best 
modeled thus. Instead of defining the sample space 3^ as a subset of a power 
set, define it as 3^ C Nq, a set of mappings that assign to each dyad {i,j) e Y 
a count. Let y^j = y{i,j) G No be the value associated with dyad {i,j). 

A (potentially curved) ERGM for a random network of counts Y E y 
then has the pmf 

(V - ,A - ^(l/)exp {r]{e)-g{y)) 

t^'^e;h,ri,9\.^ - y) - > l^J 

l^h,r,,g{t>) 

where the normalizing constant 

i^h,r,,g{0) = h{y) exp (77(0) ■ g{y)) , 

yey 

with T], g, and defined as above, and 

C Gn = {6>' G M'' : Kh,r,,9{^') < 00} (3) 
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dBarndorff-Nielsenl [l978| pp. 115-116; |Brown[ [l986l pp. 1-2), with ©n be 



ing the natural parameter space. Notably, constraint ([s]) is trivial for binary 
networks, since their sample space is finite, whereas for counts ([s]) may con- 
stitute a relatively complex constraint. 

For the remainder of this paper, we will focus on linear ERGMs, so unless 
otherwise noted, p = q and r]{0) = 0. 



3.2 Reference measure 

In addition to the specification of the sufficient statistics g and, for curved 
families, mapping r) of model parameters to canonical parameters, an ERGM 
for counts depends on the specification of the function : 3^ — )■ [0, oo). For- 
mally, along with the sample space, it specifies the reference measure: the 
distribution relative to which the exponential form is specified. For binary 
ERGMs, h is usually not specified explicitly, though in some ERGM appli- 



cations, such as models with offsets (Krivitsky et al. , 2011, for example) and 
profile likelihood calculations of Hunter et al. (2008b), the terms with fixed 



parameters are implicitly absorbed into h. 

For valued network data in general, and for count data in particular, 
specification of h gains a great deal of importance, setting the baseline shape 
of the dyad distribution and constraining the parameter space. Consider a 
very simple p = 1 model with g{y) = j)eY Vij)^ dyad 
values. If h{y) = 1 (i.e., discrete uniform), the resulting family has the pmf 



Pr.;M,.(^ - ?/) - Z-^^ 11 1 _ exp (6) 



K.h,r,,g{0) 



i.i.d. 



giving the dyadwise distribution Yij '~ ' Geometric(j9 = 1— exp (6)), with 6 < 
by On the other hand, suppose that, instead, h{y) = 11(1 j)eY(2/jjO~^- 
Then, 



Pre;,,, Ay y) .,,,,(0)n,,),,i/,,! ^ll^^,,^.!exp(^) 



giving Yj ~ ■ Poisson(/i = exp {0)), with ©n = M. The shape of the resulting 
distributions for a fixed mean is given in Figure [T} 
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Figure 1: Effect of h on the shape of the distribution. (The mean is fixed at 
2.) 

The reference measure h thus determines the support and the basic shape 
of the ERGM distribution. For this reason, we define a geometric-reference 
ERGM to have the form (|2]) with h{y) = 1 and a Poisson-reference ERGM 

to have h{y) = U(i,j)eYiyi,j-)'^- 

Note that this does not mean that any Poisson-reference ERGM will, even 
under dyadic independence, be dyadwise Poisson. We discuss the sufficient 
conditions for this in Section 15.2. 1[ 



4 Inference and implementation 

As exponential families, valued ERGMs, and ERGMs for counts in particular, 
inherit the inferential properties of discrete exponential families in general 
and binary ERGMs in particular, including calculation of standard errors 
and analysis of deviance. They also inherit the caveats. For example, the 
Wald test results based on standard errors depend on asymptotics which 



are questionable for ERGMs with complex dependence structure (Hunter 



and Handcock, 2006), so we confirm the most important of the results using 
a simple Monte Carlo test: we fit a nested model without the statistic of 
interest and simulate its distribution under such a model. The quantile of 
the observed value of the statistic of interest can then be used as a more 
robust P- value. 
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At the same time, generalizing ERGMs to counts raises additional infer- 
ential issues. In particular, the infinite sample space of counts means that the 
constraint ^ is not always trivially satisfied, which results in some valued 
ERGM specifications not fulfilling regularity conditions. We give an example 
of this in Section |5.2.3 and Appendix [Bj These issues also lead to additional 
computational issues. 



4.1 Computational issues 



The greatest practical difficulty associated with likelihood inference on these 
models is usually that the normalizing constant Kh,T],g{0) is intractable, its 
exact evaluation requiring integration over the sample space 3^. However, the 
exponential-family nature of model also means that, provided a method exists 
to simulate realizations of networks from the model of interest given a par- 



ticular 6, the methods of Geyer and Thompson (1992) for fitting exponential 



families with intractable normalizing constants and, more specifically, their 



application to ERGMs by Hunter and Handcock (2006), may be used. These 



methods rely on network sufficient statistics rather than networks themselves 
and can thus be used with little modification. More concretely, the ratio of 
two normalizing constants evaluated at 0' and can be expressed as 



K-h,T,,g{0) 



l^h,ri,g{0) 

Eyey Ky) exp {{ri{0') - 'n{0)) ■ g{y)) exp {r){0) ■ g{y)) 



K-h,rj,g{0) 



Sr^ (( (Q'\ (QW ( ^^h{y)exp{r]{0)-g{y)) 
= V exp {{r]{0 ) - 77(6/)) ■ g{y)) — 

'^h,ri,g[P) 

= Ee.,,,r,,g{ewi{rii0')-ri{0))-g{Y))), 
so given a sample Y^^\ . . . , Y^^'^ from an initial guess 0, it can be estimated 



'^^^^^^f2^xp({ri{0')-r,{0)).giY 

f^h,r,,g[0) ^ V 



Another method for fitting ERGMs, taking advantage of the equivalence 
of the method of moments to the maximum likelihood estimator for linear 
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exponential families, was implemented by Snijders (2002), using the algo 



rithm by Robbins and Monro (1951) for simulated statistics to fit the model. 



This approach also trivially extends to valued ERGMs. 

Furthermore, because the normalizing constant (if it is finite) is thus 
accommodated by the fitting algorithm, we may focus on the unnormalized 
density for the purposes of model specification and interpretation. Therefore, 
for the remainder of this paper, we specify our models up to proportionality. 



as 



Geyer (1999) suggests. 



That (|3j) is not trivially satisfied for all G presents an additional 
computational challenge: even for relatively simple network models, may 
have a nontrivial shape. For example, even a simple geometric-reference 
ERGM 

PTe;h,r,,g{Y = ?/) OC JJ CXp {O ■ Xi^y^j) , 

(ij)6Y 

a geometric GLM with a covariate p- vector Xij, has parameter space 



& = {e'eW: yu,j)eY0 



X 



<0}, 



an intersection of up to |Y| half-spaces (linear constraints). Models with 
complex dependence structure may have less predictable parameter spaces, 
and, due to the nature of the algorithm of Hunter and Handcock (2006), the 



only general way to detect whether a guess for had strayed outside of 
may be by diagnostics on the simulation. Bayesian inference with improper 
priors faces a similar problem, and addressing it in the context of ERGMs 
is a subject for future work. For this paper, we focus on models in which 
parameter spaces are provably unconstrained or have very simple constraints. 

We base our implementation on the R package ergm for fitting binary 



ERGMs. (Handcock, Hunter, Butts, Goodreau, Krivitsky, and Morris 2010) 



The design of that package separates the specification of model sufficient 



statistics from the specification of the sample space of networks (Hunter et al 



2008b), and so we implement our models by substituting in a Metropolis- 
Hastings sampler that implements our y and h of interest. (A simple sam- 
pling algorithm for realizations from a Poisson-reference ERGM, optimized 
for zero- inflated data, is described in Appendix |Aj) This implementation will 
be incorporated into a future public release of ergm. 
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4.2 Model degeneracy 



Application of ERGMs has long been associated with a complex of problems 
collectively referred to as "degeneracy". (Handcock, 2003 Rinaldo et al. 



2009; Schweinberger 2011) Rinaldo et al. in particular, list three specific, 
interrelated, phenomena: 1) when a parameter configuration — even the 
MLE — induces a distribution for which only a small number of possible 
networks have non-negligible probabilities, and these networks are often very 
different from each other (e.g., a sparser-than-observed graph and a complete 
graph) for an effectively bimodal distribution; 2) when the MLE is hard to 
find by the available MCMC methods; and 3) when the probability of the 
observed network under the MLE is relatively low — the observed network is, 
effectively, between the modes. This bimodality and concentration is often a 
consequence of the model inducing overly strong positive dependence among 
dyad values. For example, Snijders et al. (2006) use change statistics to show 
that under models with positive coefficients on triangle and fc-star {k > 2) 
counts — the classic "degenerate" ERGM terms — every tie added to the 
network increases the conditional odds of several other ties and does not 



decrease the odds of any, creating what Snijders et al. call an "avalanche" 
toward the complete graph, which emerges as by far the highest-probability 
realization. (More concretely, under a model with a triangle count with 
coefficient ^a, adding a tie increases the conditional odds of as many 
ties as i and j have neighbors by exp (^a)-) Adjusting other parameters, such 
as density, down to obtain the expected level of sparsity close to that of the 
observed graph merely induces the bimodal distribution of Phenomenon 1. 

An infinite sample space makes Phenomenon 1, as such, unlikely, because 
the "avalanche" does not have a maximal graph in which to concentrate. 
However, it does not preclude excessive dependence inducing a bimodal dis- 
tribution at the MLE, even if neither mode is remotely degenerate in the 
probabilistic sense. The observed network being between these modes, this 
may lead to Phenomenon 3, and, due to the nature of the estimation al- 
gorithms, such a situation may, indeed, lead to failing estimation — Phe- 
nomenon 2. 

In this work, we seek to avoid this problem by constructing statistics 
that prevent the "avalanche" by limiting dependence or employing counter- 
weights to reduce it. (An example of the former approach is the modeling 
of transitivity in Section 5.2.6[ and an example of the latter is the centering 
in the within-actor covariance statistic developed in Section 5.2.5 ) Formal 
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diagnostics developed to date, such as those of Schweinberger (2011) do not 
appear to be directly applicable to models with infinite sample spaces, so 



we rely on MCMC diagnostics (Goodreau, Handcock, Hunter, Butts, and 



Morris, 2008a) instead. 



5 Statistics and interpretation for count data 

In this section, we develop sufficient statistics for count data to represent 
network features that may be of interest and discuss their interpretation. In 
particular, unless otherwise noted, we focus on the Poisson-reference ERGM 
without complex constraints: y = Nq and h{y) = j)eY(?/j,jO~^- 

5.1 Interpretation of model parameters 

The sufficient statistics of the binary ERGMs and valued ERGMs alike em- 
body the structural properties of the network that are of interest. The tools 
available for interpreting them are similar as well. 



5.1.1 Expectations of sufficient statistics 

In a linear ERGM, if ©n is an open set, then, for every k G l..p, and 
holding dk', k' ^ k, fixed, it is a general exponential family property that the 



expectation E0.h,ri,g{gkO^)) strictly increasing in Ok- ( Barndorff- Nielsen 



1978, pp. 120-121) Thus, if the statistic gr^ is a measurement of some feature 
of interest of the network (e.g., magnitude of counts, interactions between or 
within a group, isolates, triadic structures), a greater value of 6k results in a 
distribution of networks with more of the feature measured by present. 



5.1.2 Discrete change statistic and conditional distribution 

Binary ERGM statistics have a "local" interpretation in the form of change 
statistics summarized in Section 2.2.1 we describe similar tools for "local" 
interpretation of ERGMs for counts here. 
Define the set of networks 

yidiv) = {y' (^y-- ^i',j'€Y\{(i,j)}y'i'j' = y^>,f}■ 

That is, {y) is the set of networks such that all dyads but the focus dyad 
{i-ii) are fixed to their values in y while («, j) itself may vary over its possible 
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values; and define t/(j j^^^ G {y' G yi,j{y) '■ y'ij = k} (a singleton set) to be 
the network with non-focus dyads fixed and focus dyad set to k. Then, let 
the discrete change statistic 

Ki^'^'giy) = g{y{^,j)=k,) - 9{y(ij)=k,)- 

This statistic emerges when taking the ratio of probabilities of two net- 
works that are identical except for a single dyad value: 

Pre;M.g(y»,j = y{i,j)=k,\Y G y^jjy)) ^ hi^,{k2) , . s N 

where hij : No — )■ M is the component of h associated with dyad {i,j), such 
that h{y) = 11(4 j)eY '^^^ ^e thus factored. For a Poisson- 

reference ERGM, hij{k) = {k\)^^. This may be used to assess the effect 
of a particular ERGM term on the decay rate of the ratios of probabil- 



ities of successive values of dyads (Shmueli, Minka, Kadane, Borle, and 



Boatwright, 2005) and on the shape of the dyadwise conditional distribu- 



tion: the conditional distribution of a dyad (i, j) G Y, given all other dyads 

{^\f)eY\{{^,J)}, 

IV ^^w hij{yij)exp{0-g{y)) 
Pr....(n. - y.jY G - E....,.)M.i^.exp(..,(,0) 

/^M(2/M)exp(0-A^,p'^g(^)) 

for an arbitrary baseline k^. 



5.2 Model specification statistics 

We now propose some specific model statistics to represent common network 
structural properties and distributions of counts. 



5.2.1 Poisson modeling 

We begin with statistics that produce Poisson-distributed dyads and model 
network phenomena that can be represented in a dyad-independent manner. 
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As a binary ERGM reduces to a logistic regression model under dyadic in- 
dependence, a Poisson-reference ERGM may reduce to a Poisson regression 
model. 

In a Poisson-reference ERGM, the normalizing constant has a simple 
closed form if g{y') is linear in y'-j and does not depend on any other dyads 



for Xij = Ai~f'''^^g{y) for any k G Nq. Then, 



y,,/~- Poisson = exp [O ■ A'',-''g{y))) 



giving a Poisson log-linear model, and A°^^gr effectively becomes the covari 



ate vector for Yij. (If g{y') is linear in y'^ j but does depend on other dyads 
— Xij in Q depends on y'-,j, but not on y[ ,j itself — the dyad distribution 
is conditionally Poisson but not marginally so. An example of this arises in 
Se ction |5.2.4[) 



Morris et al. (2008) describe many dyad-independent sufficient statistics 



for binary ERGMs. All of them have the general form 

9kiy) = Yl yid^id,k, (5) 

(ij)eY 

where Xij^k = ^ijgk ^^'^ ^i,j,k may be viewed as exogenous (to the model) 
covariates in a logistic regression for each tie. They could then be used to 
model a variety of patterns for degree heterogeneity and mixing among actors 
over (assumed) exogenous attributes. For example, for a uniform homophily 
model, Xij^k may be an indicator of whether i and j belong to the same group. 
If y^ j are counts, these statistics induce a Poisson regression type model (for 
a Poisson-reference ERGM), where the effect of a unit increase in some 0k on 



dyad (i, j) is to increase its expectation by a factor of exp (ajj j ^). Krivitsky 



et al. (2009) use this type of model Slovenian periodical "co-readerships" 



(Batagelj and Mrvar, 2006) — numbers of readers who report reading each 
pair of periodicals of interest — using as exogenous covariates the class of 
periodical (daily, weekly, regional, etc.) and the overall readership levels of 
each periodical. 

Curved (i.e., r]{6) ^ 6, p > q, and rj not a linear mapping) ERGMs, 
in which the g satisfy ^ and dyadic independence, may induce nonlinear 
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Poisson regression. An example of this is the hkehhood component of some 
latent space network models, with latent space positions being treated as free 
parameters: for example, the likelihoods of the Poisson models of Hoff (2005) 
and Krivitsky et al. ( |2009 ) are special cases of such an ERGM, with r]{0) = 
(^ij(^))(ij)eY 3{y) = (2/ij)(ij)eY sufficient statistic is the 

network), and iJiji^) mapping latent space positions and other parameters 
contained in to the logarithms of dyad means (i.e., the dyadwise canonical 
parameters) . 



5.2.2 Zero modification 

We now turn to model terms that may reshape the distribution of the counts 
away from Poisson. Social networks tend to be sparse, and larger networks 



of similar nature tend to be more sparse (Krivitsky et al. , 2011). If the 
interactions among the actors are counted, it is often the case that if two 
actors interact at all, they interact multiple times. This leads to dyadwise 
distributions that are zero-inflated relative to Poisson. 

These features of sparsity can be modeled using statistics developed for 
binary ERGMs, applied to a network produced by thresholding the counts 
(at 1, for zero- modification). For example, a Poisson-reference ERGM with 
p = 2 and 



9[y) 




T 



has dyadwise distribution 

PTe;h,r,,g{Y = t/) oc JJ cxp (Oiy^j + 6>2ly,,.>o) /v 



This is a parametrization of a zero- modified Poisson distribution (Lambert 



|1992 ), though not a commonly used one, with the probability of being 



(H-exp (^2) (exp (exp (^i)) — 1))~^ and nonzero values being distributed (con- 
ditionally on not being 0) Poisson(/i = exp {Oi)), both reducing to Poisson's 
when 02 = 0. Notably, the probability of decreases as 0i increases, rather 
than being solely controlled by 02- 
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5.2.3 Dispersion modeling 



Consider the social network of face-to-face conversations among people living 
in a region. A typical individual will likely not interact at all with vast ma- 
jority of others, have one-time or infrequent interaction with a large number 
of others (e.g., with clerks or tellers), and a lot of interaction with a rela- 
tively small number of others (e.g., family, coworkers). Some of this may be 
accounted for by information about social roles and preexisting relationships, 
but if such information is not available, this leads to a highly overdispersed 
distribution relative to Poisson, or even zero-inflated Poisson. Overdispersed 



counts are often modeled using the negative binomial distribution. (McCul 



lagh and Nelder, 1989, p. 199) However, the negative binomial distribution 
with an unknown dispersion parameter is not an exponential family, mak- 
ing it difficult to fit using our inference techniques. We thus discuss two 
purely exponential-family approaches for dealing with non-Poisson-dispersed 
interaction counts in general and overdispersed counts in particular. 

Conway— Maxwell— Poisson Distribution Conway-Maxwell-Poisson (CMP) 



distribution (Shmueli et al. , 2005) is an exponential family for counts, able 
to represent both under- and overdispersion: adding a sufficient statistic of 
the form 

to a Poisson-reference ERGM otherwise fulfilling conditions for Poisson re- 



gression described in Section [5 . 2 . 1 1 turns a Poisson regression model for dyads 
into a CMP regression model. 

Its coefficient, ^cmp, constrained by ^ to ^cmp < 1, controls the degree 
of dispersion: ^cmp = retains the Poisson distribution; ^cmp < induces 
underdispersion relative to Poisson, approaching the Bernoulli distribution as 
^CMP — ^ — oo; and ^cmp > induces overdispersion, attaining the geometric 
distribution at ^cmp = 1, its most overdispersed point. 

Normally, the greatest hurdle associated with using CMP is that its nor- 
malizing constant does not, in general, have a known closed form. In our case, 
because intractable normalizing constants are already accommodated by the 
methods of Section |4| so using CMP in this setting requires no additional 
effort. 

At the same time, CMP is neither regular nor steep (per Appendix [B|, 
so the properties of its estimators are not guaranteed, particularly for highly 
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overdispersed data. We have found experimentally that counts as dispersed 
as geometric distribution or more so often cause the fitting methods of Sec- 
tion m to fail. 



Variance-like parameters Some control over the variance can be attained 
by adding a statistic of the form g{y) = Yli{i j)&iVi,j^ a ^ 1. Statistics with 
a > 1, s uch as g{y) = J2(i j )eY vXj^ suffer the same problem as a Strauss point 
process (Kelly and Ripley, 1976): for any 6',e > 0, limy_^oo exp(6'?/^+'')/?/! 



oo, leading to (|3j) constraining 6* < 0, able to represent only underdispersion. 
Thus, we propose to model dispersion by adding a statistic of the form 



9iy)= E 

(ij)6Y 



1/2 

y^:J 



(7) 



To the extent that the counts are Poisson-like, the square root is a variance- 
stabilizing transformation (McCuUagh and Nelder, 1989, p. 196). Then, a 
model with p = 2 and dyadwise sufficient statistic 



9iy) 




yi,p Yl y^^j 

(i,i)eY 



(8) 



may be viewed as a modeling the first and second moments of Jyi^y 
the highest-order term is still on the order of 7/^ ^ guarantees that = 
a practical advantage over CMP. 

As with CMP, the normalizing constant is intractable. To explore the 
shape of this distribution, we fixed di at each of a range of values and found 
02S such that the induced distribution had the expected value of 1. We 
then simulated from the fit. The estimated pmf for each configuration and 
the comparison with the geometric distribution with the same expectation 
is given in Figure |2} Smaller coefficients on ([T]) (^i) correspond to greater 
dispersion, with coefficients on dyad sum (^2) increasing to compensate, and 
vice versa, with = corresponding to a Poisson distribution. As the dis- 
persion increases, the mean is preserved in part by increasing Pr(yjj = 0) 
and, for sufficiently high values of y^^, the geometric distribution still domi- 
nates. Thus, there is a trade-off between the convenience of a model without 
complex constraints on the parameter space and the ability to model greater 
dispersion. In practice, if the substantive reasons for over dispersion are due 
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Figure 2: Dyadwise distributions attainable by the model (|8j). Because 
Pt[Y = 0) varies greatly for different 0i yet can be adjusted separately by 
an appropriate model term, we plot the probabilities conditional on y > 0. 



to unaccounted-for heterogeneity, the latter might not be a serious disadvan- 



tage, and excess zeros can be compensated for by a term from Section 5.2.2 



5.2.4 Mutuality 

Many directed networks, such as friendship nominations, exhibit mutuality — 
that, other things being equal, if a tie (i, j) exists, a tie (j, i) is more likely to 
exist as well — and binary ERGMs can model this phenomenon using a suffi- 
cient statistic g{y) = Y.{i,j)(^Y,i<jyi,jy j,i = J2(ij)eY,i<i^Hyt,j , yj,i), count- 



ing the number of reciprocated ties. (Holland and Leinhardt 



sufficient statistics that can model it include g{y) = ^ 



9{y) = E 



1981D Other 
and 



the counts of asymmetric and symmetric dyads. 
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respectively. (Morris et al. , 2008) 



In the presence of an edge count term, these three are simply different 
parametrizations of the same distribution family: 



{Vi,j + y^d - h.,,^v,, + y. 



3,1'' 



1 + 1 



y^,.,=y.i 



Nevertheless, these three different statistics suggest two major ways to gen- 
eralize the terms to count data: by evaluating a product or a minimum of 
the values, or by evaluating their similarity or difference. We discuss them 
in turn. 



Product It is tempting to model mutuality for count data in the same 
manner as for binary data, with y^j and yj^ being values rather than indi- 
cators. For example, a simple model with overall dyad mean and reciprocity 
terms, with p = 2 and 



T 



9[y) = I Yl y^^v Y y^jy. 

,{ij)G¥ (i,j)&Y,i<j 



would have a conditional Poisson distribution: 

Yij = VijlY e yi,j{y) ~ Poisson [fi = exp (^i 



a desirable property. However, because for any c > 0, limj;_>.oo exp(c?/^) / (y!)^ = 
oo, for 02 > 0, representing positive mutuality, ^ is not fulfilled. (Note that 
the expected value of Yij is exponential in the value of Yj^i and vice versa. 



Again, a Strauss point process exhibits a similar problem. (Kelly and Ripley 



1976)) 



Geometric mean As with dispersion, the problem can be alleviated by 
using the geometric mean of y^j and yj^ instead of their product. As in 
Section 5.2.3 this choice may be justified as an analog of covariance on 
variance-stabilized counts. This changes the shape of the distribution in 
ways that are difficult to interpret: if 



9[y) 



Y Y Vy^i 

\(iJ)6Y (ij)eY,i<i 
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then 

and, with nonzero yj^, the probabihties of greater values of Yij are inflated 
by more. The analogy to covariance further suggests centering the statistic: 

{i,j)eY,i<j 

for 

Minimum An alternative generalization is to take the minimum of the two 
values. For example, if 

\{i,j)eY {i,j)eY,i<j J 

then 

Pr0.h,r,,9{Yij = y^jlY e yijiy)) oc exp {diy^j + 6>2 min(2/i_^. - t/^-^, 0)) /y^J. 

(10) 

Thus, a possible interpretation for this term is that the conditional proba- 
bility for a particular value of Vjj, y^j is deflated by exp (^2) for every unit 
by which y^j is less than y^- j. In a sense, yj ^ "pulls up" y^^ j to its level and 
vice versa. 



Negative difference Generalizing the concept of similarity between y^j 
and j leads to a statistic of difference between their values. We negate it 
so that a positive coefficient value leads to greater mutuality. Then, 

\{i,j)eY {i,j)eY,i<j ) 

and 
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Figure 3: Effect of proposed mutuality statistics (g^) with parameter 0^ > 
on the distribution of Yij, given that Yj^i = y^ ^. Whereas the min(t/jj-, y^^) 
statistic deflates the probabilities of those values of y^j that are less than y^ j, 
thus inflating all of those of y^j above or equal to it, thus "pulling Yij up", 
the — — t/j j| statistic deflates the probabilities in both directions away 
from t/^- j, thus inflating those that are the closest, "pulling Yij in". ^Jy~y~^ 
inflates greater values of y^j in general, inflating by more for greater ^JyJ^i- 



so the conditional probability of a particular y^ ^ is deflated by exp (^2) for 
every unit difference from y^- j, in either direction. Thus, y^ j "pulls in" t/^j 
and vice versa. Of course, other differences (e.g., squared difference) are also 
possible. 

We use the discrete change statistic to visualize the differences between 
these variants in Figure jsj plotting the 6^A^^^''^g^(y) summand of 



log 



Pr 



0;h,ri,g 



'Y, 



y.JYeyM) 



Pl'0;/i,r7,g(^i,j 



0\Yey,,iy)) 



■ A 



for each variant. Lastly, while the conditional distributions, and hence the 
parameter interpretations for the minimum and the negative difference statis- 
tic, are different, models induced by (10) and (11 ) are also reparametrizations 
of each other: mind/^^^-, y^- J = \ {{y^ j + yj^,)^yij - y^^.l). 



5.2.5 Actor heterogeneity 

Another property found in social networks is that different individuals have 
different overall propensities to have ties: activity, popularity, and (undi- 
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rected) sociality. Some of this heterogeneity may be accounted for by ex- 
ogenous covariates. For the unaccounted-for heterogeneity, two major ap- 
proaches have been used: conditional, in which actor-specific parameters are 
added to the model to absorb it, with dyadic independence typically assumed 
given these parameters, and marginal, in which statistics are added that rep- 
resent the effects of heterogeneity on the overall network features. Examples 
of the conditional approach include the very first exponential-family model 
for networks, the pi, which included a fixed effect for every actor (Holland 



and Leinhardt, 1981); and the p2 model and latent space models, which used 



using random effects 1 


van Duijn, Snijders, and Zijlstra 


2004 


Hoff 


2005 


Krivitsky et al. 


2009 


Mariadassou et al. , 


2010 


). The marginal approach 


includes the /c-star statistics for > 2 ( 


Frank and Strauss 


1986 


), which, for 



a fixed network density, become more prevalent as heterogeneity increases at 
the cost of often inducing degeneracy; alternating /c-stars and geometrically 



weighted degree statistics (Snijders et al. , 2006 Hunter and Handcock, 2006) 



which attempt to remedy the degeneracy of /c-stars; and statistics such as 
the square root degree activity/popularity, which sum of each actors' degree 
taken to 3/2 power, which also increases with greater heterogeneity, but not 



as rapidly as /c-stars do (Snijders, van de Bunt, and Steglich, 2010), avoiding 



degeneracy. In the conditional approach, using fixed effects lacks parsimony 
and using random effects creates a problem with a doubly-intractable nor- 
malizing constant, beyond the scope of this paper, so we develop a marginal 
approach here. 

Actor heterogeneity may be viewed marginally as positive within-actor 
correlation among the dyad values. Following the discussion in the previous 
sections, we propose a form of pooled within-actor covariance of variance- 
stabilized dyad values, scaled to the same magnitude as the dyad sum: 



9{y) 



1 



n 



y) 



(12) 



for Yj_> being the set of actors to who whom i may have ties (= {j' : G 
3^}) and ^/y defined as in ([9]). This statistic would increase with greater out- 
tie heterogeneity, an analogous statistic could be specified for in-tie hetero- 
geneity, and dropping the directionality would produce an undirected version 
of this statistic. 

We have considered other variants, including the uncentered version 



m 



which each summand in (12) is simply ^Jyl^ylj^- We found that in undi- 
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rected networks in particular, such a model term can induce a degeneracy- 
like bimodal distribution of networks. (This is likely because in undirected 
networks, the positive dependence is not contained within each actor, so 
subtracting ^Jy serves as a counterweight to avert the "avalanche" . ) 



5.2.6 Triad-closure bias 



We now turn to the question of how to represent triad-closure bias — friend- 
of-a-friend effects — in count data. As with mutuality, merely multiplying 
values of the dyads in a triad leads to a model which cannot have positive 
triad closure bias. In addition, ERGM sufficient statistics that take counts 



over triads often exhibit degeneracy. (Schweinberger , 2011 ) For these reasons, 



we describe a family of statistics that sum over dyads instead. Wyatt et al. 



(2010) use a generalization of the curved geometrically- weighted edgewise 



shared partners (GWESP) statistic (Hunter and Handcock, 2006), though it 
is not clear whether it is suitable for data with an infinite sample space. We 
thus describe a more conservative family of statistics. 

One term used to model triad closure in binary dynamic networks by 



Snijders et al. (2010) is the transitive ties effect, the most conservative special 



case of the GWESP (Hunter and Handcock, 2006) statistic. This statistic 



counts the number of ties (z,j) such that there exists at least one path of 
length 2 (two-path) between them — a third actor k such that t/j = j = 1. 
(Unlike the triangle count, each dyad may contribute at most -|-1 to the 
statistic, no matter how many such ks exist.) 
One generalization of this statistic to counts is 



9{y) = m.ml^ yij, max {mm{yi^k, Vkj)) 



(13) 



Intuitively, define the strength of a two-path from i to j to be the minimum of 
the values along the path. The statistic is then the sum over the dyads (i, j) 
of the minimum of the value of and the value of the strongest two-path 
between. The interpretation is thus somewhat analogous to that of the min- 
imum mutuality statistic, with j/^ j replaced by maxfceAr(min(j/j Vkj))- The 
motivation for using minimum, as opposed to negative absolute difference, 
to combine the two-path value with the focus dyad value is that the intuitive 
notion of friend-of-a-friend effect that this statistic embodies suggests that 
while the presence of a mutual friend may increase the probability or ex- 
pected value of a particular friendship (i.e., "pull it up"), it should not limit 
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it (i.e., "pull it in") as an absolute difference would. These interpretations 
are somewhat oversimplified: it is just as true that a positive coefficient on 
this statistic results in y^j "pulling up" the potential two-paths between i 
and j. 



In a directed network, (13) would model transitive (hierarchical) triads, 
while 

9{y) = ^ mini y^j, max (min(t/j- fc, j/^ J) j 

would model cyclical (antihierarchical) triads. 

The statistic ( jlsj ) is a fairly conservative one, less likely to induce excessive 
dependence and bimodality, at the cost of sensitivity. More generally, one 
may specify a triadic statistic using three functions: first, f2-path : Nq — )■ M, 
how the "value" of a two-path i — )■ j — )• A; is computed from its constituent 
segments; second, fcombine : — > M, how the values of the possible two- 
paths from i to j are combined with each other to compute the strength of 
the pressure on i and j to close the triad or increase their interaction; and 
third, f affect : No X M — )■ M how this pressure affects Yij. Given these, 

g{y) = '"^Sect{yiJ,Vcomh■me{v2-p^.th{yi,k,yk,j)keN\{^,j})) ■ (14) 

{ij)6Y 

Thus, for example, one could set f combine to sum its arguments rather than 
take their maximum, or one can replace taking the minimum with taking a 



geometric mean. We illustrate the difference it makes in Section 6.2 



6 Examples 

6.1 Example 1: Social relations in a karate club 

In this application, we use a Poisson-reference ERGM to compare impacts 
of social forces — transitivity and homophily — on the structure of a valued 
network of interactions between members of a university karate club. [Zachary] 



(1977) reported observations of social relations in a university karate club 
with membership that varied between 50 and 100. The actors — 32 ordinary 
club members and officers, the club president ("John A."), and the part-time 
instructor ( "Mr. Hi" ) — were the ones who consistently interacted outside of 
the club. Over the course of the study, the club divided into two factions, and. 
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ultimately, split into two clubs, one led by Hi and the other by John and the 
original club's officers. The split was driven by a disagreement over whether 
Hi could unilaterally change the level of compensation for his services. We 



pose a similar question to Goodreau et al. (2008b): is the structure at the 



time of observation driven by faction allegiance or by transitivity ("friend- 
of-a-friend" effects)? 

[Zachary identifies the faction with which each of the 34 actors was aligned 
and how strongly and reports, for each pair of actors, the count of social 
contexts in which they interacted. The 8 contexts considered were academic 
classes at the university; Hi's private karate studio in his night classes; Hi's 
private karate studio where he taught on weekends; student-teaching at Hi's 
studio; the university rathskeller (bar) located near the karate club; a bar 
located near the university campus; open karate tournaments in the area; 
and intercollegiate karate tournaments. The highest number of contexts of 
interaction for a pair of individuals that was observed was 7. The network is 
visualized in Figure |4j 

We begin with a Poisson-reference ERGM. Empirically, this network is 
more sparse than a Poisson density for dyad values would suggest: the mean 
number of dyadwise contexts of interaction (^(^ j)gY 1^1) 0.41, for 
which a Poisson distribution predicts an expected density (E(X](j j)eY ^Y^,j>o/ \^\)) 
of 0.34, whereas the observed density is 0.14. Given that two individuals in- 
teract, the interaction for a given pair of individuals is likely to be dependent 
across the social contexts counted so the counts are likely to be over- or 
under-dispersed. Thus, as a baseline, we model the values as a zero-modified 
Conway-Maxwell-Poisson (Shmueli et al. , 2005) distribution using the fol- 
lowing sufficient statistics: 

baseline propensity to have ties: number of dyads with nonzero value; 
baseline intensity of interactions: sum of dyad values; and 
CMP dispersion: a statistic of the form ([6]). 

In modeling the structure of the interactions, we represent differential 
propensity of the faction leaders to interact, the effect of differences in faction 
membership, and triad-closure bias using the following sufficient statistics: 

intensity of Hi's interaction: sum of dyad values incident on Hi; 

intensity of John's interaction: sum of dyad values incident on John; 

similarity (negative difference) in faction membership: a statistic of 
the form ([s]) with I, where rrij is the faction mem- 

bership code of actor i; and 
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Figure 4: The sociogram of the Zachary (1977) karate club network. The 



color of each plotting symbol shows each actor's faction alignment and its 
shape shows which of the two clubs the actor ultimately joined. Darker and 
thicker lines correspond to more social context of interaction. 
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Table 1: Results from fitting the models to Karate Club network 



Estimates (Std. Errors) 
Term Faction Transitivity Full 



Dispersion 


-2.55 


(0.57) 


-1.87 


(0.61) 


-2.33 


(0.60) 


Ties 


-7.76 


(0.99) 


-7.29 


(1.04) 


-7.54 


(1.01) 


Baseline intensities 


3.97 


(0.68) 


2.88 


(0.75) 


3.64 


(0.74) 


Hi's intensities 


0.80 


(0.15) 


0.50 


(0.12) 


0.71 


(0.15) 


John's intensities 


0.80 


(0.14) 


0.54 


(0.12) 


0.72 


(0.16) 


Faction similarity 


0.27 


(0.04) 






0.25 


(0.04) 


Transitivity 






0.21 


(0.09) 


0.11 


(0.09) 



Coefficients statistically significant at a = 0.05 are bolded. 

Standard errors incorporate the uncertainty introduced by approximating of 



the likelihood using MCMC (Hunter and Handcock, 2006). 



transitivity of intensities: the statistic (13). 

Faction memberships rrii are coded as follows: strongly Hi's as —2, weakly 
Hi's as —1, neutral as 0, weakly John's as +1, and strongly John's as +2. We 
fit three models: the full model, with all of the above-described terms, the 
model excluding transitivity ("Faction"), and the model excluding faction 
membership ("Transitivity"). 

Table [T] gives the results for the three fits. MCMC diagnostics, described 



by Goodreau et al. (2008a), show adequate mixing and networks simulated 
from these fits have, on average, statistics equal to the observed sufficient 
statistics. The CMP dispersion estimates for all three models are negative 
and highly significant, very far from the non-open boundary of the parameter 
space at 6k < 1, so the lack of steepness is unlikely to be problematic in this 
case. The estimated value of the dispersion parameter for the full model 
(—2.33) suggests strong underdispersion relative to zero-modified Poisson 
and the rest of the model: it implies that the estimated "denominator" is 
{yijiy~''~'^'^^^ = iVij^-)^'^^, rather than Poisson's {y^jiy. Highly negative 
CMP coefficients may also be interpreted as the model being an overfit. 

There is a highly significant negative coefficient on the baseline propensity 
for ties. An interpretation for this is that, from the point of view of a single 
dyad, the probability of a given pair of actors having a tie is deflated, but 
if they do have a tie, it is likely to be across multiple social contexts. Both 
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faction leaders appear to have greater overall propensities to interact than 
the other club members, and, interestingly, they appear to have similar effect 
sizes to each other. 

Taken separately, the faction similarity effect is highly statistically sig- 
nificant and positive, indicating a positive faction cohesion. The transitivity 
effect is significant by the Wald test, but a Monte Carlo test gives its one- 
sided (since only positive transitivity is of interest) P-value as 0.11. Put 
together, the transitivity loses any potential significance. (Notably, the es- 
timated correlation between their parameter estimates is —0.34.) This sug- 
gests that they are explaining the same aspect of the network structure, but 
that faction allegiance is the much stronger explanation. Though factions 
may themselves be endogenous due to influence through social relations or, 
as [Zachary] concludes, the two processes reinforced each other over time, at 
observation time, faction allegiance explains network structure better than 
transitivity. 



6.2 Example 2: Interactions in a fraternity 



In a series of studies in the 1970s, Bernard et al. (1979-1980) assessed accu- 



racy of retrospective sociometric surveys in a number of settings, including 
a college fraternity whose 58 occupants had all lived there for at least three 
months. To record the true amounts of interaction, for several days, un- 
obtrusive observers were sent to periodically walk through the fraternity to 
note students engaged in conversation. Obtaining these network data from 



Batagelj and Mrvar (2006), we model these observed pairwise interaction 



counts. 

The raw distribution of counts, given in Figure 5a, appears to be strongly 
overdispersed relative to Poisson, and, indeed, relative to the geometric dis- 
tribution: the mean of counts is 2.0, while their standard deviation (not 
variance) is 3.4. At least some of this is due to actor heterogeneity: the 
square root of the within-actor variance of the counts is 3.1. Excluding ex- 
treme observations (all values over 30) does not make a qualitative difference. 
(The statistics are 1.9, 3.0, and 2.8, respectively.) Nor does there appear a 
natural place to threshold the counts to produce a binary network. (See 
Figure 5b ) We thus model the baseline shape of the distribution of counts 



using the following terms: 

baseline propensity to have ties: number of dyads with nonzero value; 
baseline intensity of interactions: sum of dyad values; and 
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Figure 5: Conversation count summaries for Bernard and Killworth's fra- 
ternity network 



underdispersion: the statistic 
(We have also attempted to use CMP (via (|6])) but found the process to be 
unstable due to the greater-than-geo metric level of dispersion.) 

Little was recorded about the social roles of the fraternity members, so 
we consider the effects of endogenous social forces: 
actor heterogeneity: the undirected version of (12); 
transitivity of intensities: the statistic (13). 



Faust (2007), in particular, found that in many empirical networks, much of 



the apparent triadic effects are accounted for by variations in degree distribu- 
tion and other lower-order effects. Thus, we consider four models: baseline 
shape only (B), baseline with heterogeneity (BH), baseline with transitivity 
(BT), and all terms (BHT), to explore this concept in a valued setting. 
We report the model fits in Table [2j MCMC diagnostics, described by 



Goodreau et al. (2008a), show adequate mixing and unimodal distributions of 



sufficient statistics, and networks simulated from these fits have, on average, 
statistics equal to the observed sufficient statistics. The baseline dyadwise 
distribution terms are difficult to interpret, but the highly negative coefficient 
on underdispersion suggests a a strong degree of overdispersion, as expected. 
Some of this overdispersion appears to be absorbed by modeling actor het- 
erogeneity, however. There are indications a high degree of heterogeneity in 
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Table 2: Results from fitting the models to Bernard and Killsworth's frater- 
nity network 







Estimates (Std. Errors) 




Term 


B 


BH 


BT 


BHT 


Ties 


5.60 (0.21) 


4.96 (0.17) 


6.24 (0.21) 


4.98 (0.17) 


Intensity 


3.65 (0.05) 


3.13 (0.06) 


3.40 (0.07) 


3.12 (0.06) 


Underdispersion 


-9.71 (0.22) 


-8.23 (0.20) 


-10.52 (0.22) 


-8.26 (0.19) 


Heterogeneity 




1.48 (0.06) 




1.46 (0.07) 


Transitivity 






0.46 (0.05) 


0.03 (0.04) 



CoefRcients statistically significant at a = 0.05 are bolded. 

Standard errors incorporate the uncertainty introduced by approximating of the likelihood 
using MCMC ([Hunter and Handcock[ |2006[). 



individuals' interactions, over and above that expected for even the overdis- 
persed baseline distribution. (Monte Carlo P-val. < 0.001 based on 10,000 
draws.) 



Without accounting for actor heterogeneity (i.e.. Model BT), there ap- 
pears to be a strong transitivity effect — a friend of a friend is a friend — 
and the Monte Carlo test confirms this with a similar P-value. However, 
if actor heterogeneity is accounted for, the transitivity effects vanish (simu- 
lated one-sided P-val. = 0.43), suggesting that the underlying social process 
is better explained by a relatively small number of highly social individu- 
als whose ties to each other and to (less social) third parties create excess 
transitive ties for the overall amount of interaction observed. At the same 
time, if, instead of using (13) as the test statistic, we use a less conserva- 
tive statistic of the form (14) with f2-path(a;i, ^2) = ^Jx\X^2. (geometric mean), 
fcombine(a;i, . . . ,a;n_2) = YTk=\^^^ t^affect(a;i,X2) = a/xIx^, the effect's 
significance seems to increase (one-sided P-val. = 0.07). However, when 
we attempted to fit the model with this effect, the process exhibited the 
degeneracy-like bimodality. This suggests that there is a trade-off between 
stability and power to detect subtle effects. 
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7 Discussion 



We have generalized the exponential-family random graph models to net- 
works whose relationships are unbounded counts, explored the issues that 
arise when generalizing, and proposed ways to model several common net- 
work features for count data. We demonstrated our development by analyz- 
ing two very different networks to examine the interaction of friend-of-a- friend 
effects with homophily and individual heterogeneity. 

This paper focused on modeling counts. More generally, one can express 
a valued ERGM by replacing the set of possible dyad values No by a more 
general set S and replacing h{y) with a more general a- finite measure space 
(3^, Y, Ph) with reference measure Ph, then postulating a probability measure 
Pe;Ph,'n,g with Radon-Nikodym derivative of Po;Pi^,r),g with respect to P^, 

dPe;P,,,ri,g ^ exp (77(0) ■ g{y)) 



dPh 



(Barndorff-Nielsen, 1978 pp. 115-116; Brown 1986 pp. 1-2) with the nor- 
malizing constant 



exp {'n{0)-g{y))dPn{y). 



y 



For binary and count data, and discrete data in general, Ph could be specified 
as a function relative to the counting measure, while for continuous data, it 
could be defined with respect to the Lebesgue measure. Still, as with count 
data, the shape of this function would need to be specified. 

Other scenarios might call for more complex specifications of the refer- 
ence measure. Some network data, such as measurements of duration of 



conversation (Wyatt et al. , 2010) and international trade volumes (Westveld 



and Hoff, 2011) are continuous measurements except for having a positive 



probability of two actors not conversing at all or two countries having no 
measured trade. IWestveld and Hoffl use a normal distribution to model the 
log-transformed trade volume, imputing = log(l) for observed trade vol- 
umes (all nonzero trade volumes being greater than 1 unit), and they note 
this issue and address it by pointing out that in their (latent- variable) model, 
an impact of such an outlier would be contained. Valued ERGMs may pro- 
vide a more principled approach by specifying a semi continuous P/i, such as 
one that puts a mass of 1/2 on and 1/2 on Lebesgue measure on (0, 00). 
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We have also focused on data that do not impose any constraints on the 
sample space: 3^ = S^. But, some types of network data, such as those where 



each actor ranks the others (Newcomb, 1961) may be viewed as imposing 
a more complex constraint on sample spaces: setting S = {1..?t, — 1} and 
constraining y to ensure that each ego assigns a unique rank to each alter 
gives a sample space of permutations that could, with a counting measure, 
serve as the reference measure for an ERGM on rank data. These, and other 
applications are a subject for ongoing and future work. 

This paper focuses on models for cross-sectional networks, where a sin- 
gle snapshot of relationship states or relationships aggregated over a time 
period are observed. For longitudinal data, comprising multiple snapshots 
of networks over the same actors over time, binary ERGMs have been used 
as a basis for discrete-time models for network tie evolution by 'Robins and 



Pattison 


( 


2001 


), 


Wyatt et al. 


and 


Krivitsky and Handcock 



( 2009 [2010| ), [Hanneke, Fu, and Xing, ( |2010| ), 
pOlot . Val ued ERGMs can be di rectly ap- 
plied to the discrete temporal ERGMs of Hanneke et al. (2010) although 
their adaptation to the work of Krivitsky and Handcock (2010) may be less 
straightforward, especially if the benefits to interpretability of the separable 
models are to be retained. 

In practice, networks are not always observed completely. [Handcock and 



Gile (2010) develop an approach to ERGM inference for partially observed 



or sampled binary networks. It would be natural to extend this approach to 
valued networks and valued ERGMs. 

Some methods for assessing a network model's fit, particularly MCMC 
diagnostics (Goodreau et al. , 2008a) can be used with little or no modifi- 



cation. Others, like the goodness-of-fit methods of Hunter, Goodreau, and 



Handcock (2008a) may require development of characteristics meaningful for 



valued networks. It may also be possible to extend the stability criteria of 



Schweinberger (2011) to models with infinite sample spaces. 
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A A sampling algorithm for a Poisson-reference 
ERGM 

We use a Metropolis-Hastings sampling algorithm (Algorithm [T]) to sample 
from a Poisson-reference ERGM, using a Poisson kernel with its mode at the 
present value of a dyad and, occasionally (with a specified probability ttq), 
proposing a jump directly to 0. Because, as we discuss in Section [5.2.2[ counts 
of interactions are often zero-inflated relative to Poisson, setting ttq > can 
be used to speed-up mixing. For highly overdispersed distributions, a Poisson 
kernel may be trivially replaced by a geometric or even negative-binomial 
kernel. 

This algorithm selects the dyad on which the jump is to be proposed at 
random. A possible improvement to this algorithm would be to adapt to it 



the tie-no-tie (TNT) proposal (Morris et al. , 2008), which optimizes sampling 



in sparse (zero-inflated) networks by focusing on dyads which have a nonzero 
values. 



B Non- steepness of the Conway— Max well— 
Poisson family 

Expressed in its exponential-family canonical form, a random variable X 
with the Conway-Maxwell-Poisson distribution has the pmf 



Pr0;^,g(X = x) = 
with the normalizing constant 



exp {Oix + 62 log(x!)) 



Kr,,g(6') 



X G No 



x'=0 



Theorem B.l. The Conway-Maxwell-Poisson family is not regular. 
Proof. The natural parameter space of CMP is 

0N = {0' G : 6/2 < V (6>2 = A 01 < 0)} 



(Shmueli et al. , 2005). Due to the boundary at 62 = 0, ©n is not an open 
set, and hence the family is not regular (Brown, 1986, p. 2). □ 
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Algorithm 1 Sampling from a Poisson-reference ERGM with no constraints, 
optimized for zero-inflated distributions 



Let: 



RandomClioose(y4) return a random element of a set A 
Uniform(a, b) return a random draw from the Uniform(a, b) 
distribution 

Poisson^j/(A) return a random draw from the Poisson(A) distribution, 
conditional on not drawing y 



Input: G y, T sufficiently large, Y, g, r], ttq E [0, 1) 
Output: a draw from the specified Poisson-reference ERGM 
1: for t ^ 1..T do 

2: RandomChoose(Y) {Select a dyad at random.} 

3: if y^ j 7^ A Uniform(0, 1) < ttq then 
4: I/* {Propose a jump to with probability ttq.) 
5: else 



p{y*;y) 



exp(-(y+|))(y+i)'' /y*\ 
l-exp(-(y+i))(2/+|)Vj/! 



,1' 



the pmf of a Poisson^2^(j/ -|- |) draw 



8: 



6 



7: 




9 



if Uniform(0, 1) < r then 



10: 



y^*^ ^ y[i,j)=y* {Accept the proposal.} 



11 



else 



13 



12 



yW ^ yit 1) {Reject the proposal.) 
return y^'^^ 
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Theorem B.2. The Conway-Maxwell-Poisson family is not steep. 

Proof. A necessary and sufficient condition for a non-regular exponential 
family to be steep is that 

V0e0N\®° Ee;r,,g{\\g{X)\\) = oo, 

where 0^ is the open interior of ©n, and their set difference is thus the 
non-open boundary of the natural parameter space that is contained within 
it. (Brown, 1986, Proposition 3.3, p. 72) For CMP, this boundary 

0N\e^ = {0' ER^ : 02 = A0i < 0}. 

There, X ~ Geometric(p = 1 — exp{0i)). Noting that X > a.s., 
log(X!) > a.s., and log(x!) < (x + 1) log {^) + 1, 

Ee;r,,g{\\9iX)\\) = EGeometric(p=l-exp(0,))(ll[^>log(^')]^ll) 

< EGeometric(p=l-oxp(0i)) {X + log(X!)) 

< EGeometric(p=l-exp{0i)) (^X + (X + 1) log ^ ^ + 1^ 

— EGeometric(p=l— cxp(0i)) 

(X + (X + 1)2 + 1) 

< oo, 

since the ffist and second moments of the geometric distribution are finite. 
Therefore, CMP is not steep. □ 

Because the non-steep boundary corresponds to the most dispersed dis- 
tribution that CMP can represent, maximum likelihood estimator properties 
for data which are highly overdispersed are not guaranteed. 
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